Goto

Collaborating Authors

 feature unit



Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

Neural Information Processing Systems

Adversarial examples, generated by carefully crafted perturbation, have attracted considerable attention in research fields. Recent works have argued that the existence of the robust and non-robust features is a primary cause of the adversarial examples, and investigated their internal interactions in the feature space. In this paper, we propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck. Specifically, we inject noise variation to each feature unit and evaluate the information flow in the feature representation to dichotomize feature units either robust or non-robust, based on the noise variation magnitude. Through comprehensive experiments, we demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves. Furthermore, we present an attack mechanism intensifying the gradient of non-robust features that is directly related to the model prediction, and validate its effectiveness of breaking model robustness.




The Interpretability Analysis of the Model Can Bring Improvements to the Text-to-SQL Task

Zhang, Cong

arXiv.org Artificial Intelligence

Currently, AI technology is profoundly transforming the database landscape. Text - to - SQL, by innovating data provisioning to cater to the information retrieval and data analysis needs of a broader audience of everyday users, is emerging as a catalyst for propelling databases towards greater efficiency, collaboration, and intelligence. In recent years, text - to - SQL solutions leveraging large autoregressive models have continually surpassed existing methods on be nchmark datasets for multi - table complex queries (Zhu et al., 2024), such as Spider (Yu et al., 2018c) and BIRD (Li et al., 2023), attributed to their exceptional natural language underst anding and generation capabilities. In reality, it is highly prevalent for users of reporting systems to conduct simple queries, statistical analyses, and evaluations on consolidated single - report data derived from multi - table integration and field augmentation within databases. The single - table query dataset exemplified by WikiSQL (Zhong et al., 2017) aligns well with this application scenario. Despite its relatively straightforward synta x and lesser complexity when compared to datasets like Spider and BIRD (Deng et al., 2022), WikiSQL continues to serve as a pivotal benchmark for demonstrating the technical feasibility of converting natural language into simple SQL and validating the fundamental capabilities of models.


Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

Neural Information Processing Systems

Adversarial examples, generated by carefully crafted perturbation, have attracted considerable attention in research fields. Recent works have argued that the existence of the robust and non-robust features is a primary cause of the adversarial examples, and investigated their internal interactions in the feature space. In this paper, we propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck. Specifically, we inject noise variation to each feature unit and evaluate the information flow in the feature representation to dichotomize feature units either robust or non-robust, based on the noise variation magnitude. Through comprehensive experiments, we demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves.


Relation learning in a neurocomputational architecture supports cross-domain transfer

Doumas, Leonidas A. A., Puebla, Guillermo, Martin, Andrea E., Hummel, John E.

arXiv.org Artificial Intelligence

People readily generalise prior knowledge to novel situations and stimuli. Advances in machine learning and artificial intelligence have begun to approximate and even surpass human performance in specific domains, but machine learning systems struggle to generalise information to untrained situations. We present and model that demonstrates human-like extrapolatory generalisation by learning and explicitly representing an open-ended set of relations characterising regularities within the domains it is exposed to. First, when trained to play one video game (e.g., Breakout). the model generalises to a new game (e.g., Pong) with different rules, dimensions, and characteristics in a single shot. Second, the model can learn representations from a different domain (e.g., 3D shape images) that support learning a video game and generalising to a new game in one shot. By exploiting well-established principles from cognitive psychology and neuroscience, the model learns structured representations without feedback, and without requiring knowledge of the relevant relations to be given a priori. We present additional simulations showing that the representations that the model learns support cross-domain generalisation. The model's ability to generalise between different games demonstrates the flexible generalisation afforded by a capacity to learn not only statistical relations, but also other relations that are useful for characterising the domain to be learned. In turn, this kind of flexible, relational generalisation is only possible because the model is capable of representing relations explicitly, a capacity that is notably absent in extant statistical machine learning algorithms.


Recognition of Visually Perceived Compositional Human Actions by Multiple Spatio-Temporal Scales Recurrent Neural Networks

Lee, Haanvid, Jung, Minju, Tani, Jun

arXiv.org Artificial Intelligence

Abstract--The current paper proposes a novel neural network model for recognizing visually perceived human actions. The proposed multiple spatiotemporal scales recurrent neural network (MSTRNN) model is derived by introducing multiple timescale recurrent dynamics to the conventional convolutional neural network model. One of the essential characteristics of the MSTRNN is that its architecture imposes both spatial and temporal constraints simultaneously on the neural activity which vary in multiple scales among different layers. As suggested by the principle of the upward and downward causation, it is assumed that the network can develop meaningful structures such as functional hierarchy by taking advantage of such constraints during the course of learning. T o evaluate the characteristics of the model, the current study uses three types of human action video dataset consisting of different types of primitive actions and different levels of compositionality on them. The performance of the MSTRNN in testing with these dataset is compared with the ones by other representative deep learning models used in the field. The analysis of the internal representation obtained through the learning with the dataset clarifies what sorts of functional hierarchy can be developed by extracting the essential compositionality underlying the dataset. ECENTL Y, a convolutional neural network (CNN) [1], inspired by a mammalian visual cortex, showed a remarkably better object image recognition performance than conventional vision recognition schemes which employ elaborately hand-coded visual features. A CNN trained with 1 million visual images from ImageNet [2] was able to classify hundreds of object images with an error rate of 6.67% [3], and demonstrated near-human performance [4]. As a consequence, CNNs are less effective in handling video image patterns than static images. To address this shortcoming, a number of action recognition models have been developed. H. Lee is with the Department of Electrical Engineering, Korea Institute of Science and Technology, Daejeon 305-701, Republic of Korea, email: (haanvidlee@gmail.com). M. Jung is with the Department of Electrical Engineering, Korea Institute of Science and Technology, Daejeon 305-701, Republic of Korea, email: (minju5436@gmail.com).


A Principle for Unsupervised Hierarchical Decomposition of Visual Scenes

Mozer, Michael C.

Neural Information Processing Systems

Structure in a visual scene can be described at many levels of granularity. Ata coarse level, the scene is composed of objects; at a finer level, each object is made up of parts, and the parts of subparts. In this work, I propose a simple principle by which such hierarchical structure can be extracted from visual scenes: Regularity in the relations among different parts of an object is weaker than in the internal structure of a part. This principle can be applied recursively to define part-whole relationships among elements in a scene. The principle does not make use of object models, categories, or other sorts of higher-level knowledge; rather, part-whole relationships can be established based on the statistics of a set of sample visual scenes. I illustrate with a model that performs unsupervised decompositionof simple scenes. The model can account for the results from a human learning experiment on the ontogeny of partwhole relationships.


A Principle for Unsupervised Hierarchical Decomposition of Visual Scenes

Mozer, Michael C.

Neural Information Processing Systems

Structure in a visual scene can be described at many levels of granularity. At a coarse level, the scene is composed of objects; at a finer level, each object is made up of parts, and the parts of subparts. In this work, I propose a simple principle by which such hierarchical structure can be extracted from visual scenes: Regularity in the relations among different parts of an object is weaker than in the internal structure of a part. This principle can be applied recursively to define part-whole relationships among elements in a scene. The principle does not make use of object models, categories, or other sorts of higher-level knowledge; rather, part-whole relationships can be established based on the statistics of a set of sample visual scenes. I illustrate with a model that performs unsupervised decomposition of simple scenes. The model can account for the results from a human learning experiment on the ontogeny of partwhole relationships.